Detections API (v1/text/contents) integration with base interface pattern and extensible client architecture by srikartondapu · Pull Request #14 · trustyai-explainability/NeMo-Guardrails

srikartondapu · 2025-11-26T15:12:49Z

Description

This PR adds support for the Detections API v1/text/contents protocol, enabling NeMo Guardrails to communicate with external detector services that implement this standardized interface (e.g., TrustyAI guardrails-detectors, FMS Guardrails Orchestrator detectors).

Key Changes:

Base Interface Pattern: Introduced BaseDetectorClient abstract class that eliminates code duplication when supporting multiple detector API protocols. Common logic (HTTP communication, session management, authentication, error handling) is shared, while API-specific logic (request/response formats) is isolated in subclass implementations.
Detections API Client: Implemented DetectionsAPIClient that handles:
- Request format: {"contents": [text], "detector_params": {}}
- Response parsing: Nested array structure [[{detection1}, detection2}]]
- Multiple detections per text with threshold-based filtering
- Rich metadata extraction (spans, categories, confidence scores)
Configuration Support: Added DetectionsAPIConfig to RailsConfigData enabling ConfigMap-driven detector management without code changes.
Action Functions: Implemented detections_api_check_all_detectors() and detections_api_check_detector() for NeMo rails.co integration with parallel execution and proper error separation (system errors vs content violations).
Comprehensive Documentation: Added deployment guide with Granite Guardian HAP example, testing instructions, and guide for adding new detectors.

Design Benefits:

Extensible: Add new API protocols by implementing build_request() and parse_response() methods only
No code duplication: Shared orchestration, HTTP, and error handling across all detector types
Configuration-driven: Add/remove detectors via ConfigMap updates

Testing Performed:

Deployed Granite Guardian HAP detector using TrustyAI guardrails-detectors
Verified safe content passes through to LLM
Verified harmful content blocked by detector (jailbreak, harm, unethical_behavior detection)

Related Issue(s)

Addresses the need for standardized detector API integration to support multiple detector service protocols (Detections API, KServe V1, future protocols) through a unified, extensible architecture.

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@m-misiura for review

m-misiura

You should run pre-commit to make this code adhere to the NeMo style

m-misiura · 2025-12-02T09:59:21Z

+import logging
+from typing import Any, Dict, List
+
+from .base import BaseDetectorClient, DetectorResult


module imports are hanled inconsistenly, e.g. here they are relative but in actions.py, they are absolute, e.g.

from nemoguardrails.library.detector_clients.base import DetectorResult from nemoguardrails.library.detector_clients.detections_api import DetectionsAPIClient

Fixed - changed all imports to absolute. actions.py required absolute imports because NeMo's action discovery uses importlib.util.spec_from_file_location() which loads the module without package context, causing relative imports to fail. Updated detections_api.py to match for consistency.

m-misiura · 2025-12-02T10:08:13Z

+        Returns:
+            DetectorResult with parsed detection outcome
+        """
+        if http_status != 200:


can this distinguish between e.g. 404: Detector not found and 422: Validation error (invalid request)?

if http_status != 200: return DetectorResult( allowed=False, score=0.0, reason=f"HTTP {http_status} error", label="ERROR", detector=self.detector_name, metadata={"http_status": http_status} )

see Detector API spec: https://foundation-model-stack.github.io/fms-guardrails-orchestrator/docs/api/openapi_detector_api.yaml

Fixed - added specific handling for 404 (NOT_FOUND) and 422 (VALIDATION_ERROR) per Detections API spec.

in terms of extracting HTTP Status Mapping - what do you think about using a Dict Instead of if/elif?

m-misiura · 2025-12-02T10:18:51Z

+        scores = [d.get("score", 0.0) for d in detections]
+        return max(scores) if scores else 0.0
+
+    def _calculate_average_score(self, detections: List[Dict[str, Any]]) -> float:


I am not sure how informative it is to calculate average score across detectors; it might be best to remove _calculate_average_score

Fixed - Removed _calculate_average_score() and all average_score references from metadata.

m-misiura · 2025-12-02T15:51:17Z

I am not sure if the defined Colang flow definition is correct or if there is something wrong with the implementation

Re-running the same message gives me inconsistent outputs, e.g. sometimes

variant 1

{"messages":[{"role":"assistant","content":"I'm sorry, but I couldn't process your request due to the following reason: Blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector. Please feel free to ask something else or try rephrasing your question."}]

variant 2:

{"messages":[{"role":"assistant","content":"Sorry, but I'm unable to assist with that request."}]}%

variant 3

{"messages":[{"role":"assistant","content":"This prompt is blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector"}]}%

and so on; please investigate

Fixed - changed from bot refuse with message $variable to predefined bot response pattern. The issue was that variable interpolation in bot messages triggered LLM generation with inconsistent fallback paths. Now using define bot blocked by detector with static message, which provides deterministic responses. Verified with multiple identical requests - output is now consistent.

m-misiura · 2025-12-02T16:02:15Z

+            }
+        )
+
+    def _extract_detections_from_response(


there could be opportunities to potentially simplify _extract_detections_from_response since I am not sure if the API can return a flat array, but please check

Simplified - removed flat array fallback since Detections API spec guarantees nested array structure (array of ContentsAnalysisResponse, which is itself an array).

m-misiura · 2025-12-02T16:03:34Z

+
+        return response
+
+    def _calculate_highest_score(self, detections: List[Dict[str, Any]]) -> float:


is this really necessary or is it possible to use e.g. in-built max function instead of the custom one?

Fixed - Removed _calculate_highest_score() and inlined with built-in max() function using default parameter.

m-misiura · 2025-12-02T16:07:28Z

+                "average_score": average_score,
+                "individual_scores": individual_scores,
+                "highest_detection": highest_detection,
+                "detections": filtered_detections


L176 seems inconsistent with L149?

if metadata is needed, would it be better to have a consistent format where you always display all detectors and then just say pass / fail per detector?

Fixed - standardized metadata structure to be consistent across both cases and added "passed" boolean flag to each detection. The inconsistency was from initially treating BELOW_THRESHOLD as a simpler case, but I agree that consistent structure makes it easier for the end user.

m-misiura · 2025-12-02T16:15:35Z

+        result = await client.detect(text)
+        return result
+
+    except Exception as e:


is this dead code as detections_api.py already has try/except that always returns DetectorResult?

Fixed - kept exception handler to catch constructor validation errors (e.g., missing detector_id in ConfigMap). This ensures misconfigured detectors are gracefully marked as unavailable rather than crashing the entire action, allowing other detectors to continue running.

m-misiura · 2025-12-02T16:17:05Z

+    if isinstance(user_message, dict):
+        user_message = user_message.get("content", "")
+
+    detections_api_detectors = getattr(


a safer pattern or proper check might be worthwhile to implement

Fixed - added explicit null-safety checks before accessing config.rails.config to prevent AttributeErrors if configuration chain is incomplete. Returns safe default (allowed: True) with warning log if any part of the config path is missing.

m-misiura · 2025-12-02T16:18:35Z

+        f"{list(detections_api_detectors.keys())}"
+    )
+
+    tasks_with_names = [


Why store tuples then extract with task[1] and tasks_with_names[i][0]?

Fixed - replaced tuple pattern with separate lists for detector names and tasks. Now using zip(detector_names, results) for explicit association instead of index access (task[1], tasks_with_names[i][0]).

m-misiura · 2025-12-02T16:21:21Z

+    if not config:
+        return {"allowed": False, "reason": "No configuration"}
+
+    user_message = context.get("user_message", "")


does this mean I can only set this up as in input guardrail?

what if I would like to also configure this fr other message types?

Fixed - added support for output guardrails and multiple message types. The actions now automatically detect and check the appropriate message from context:

user_message for input guardrails
bot_message for output guardrails

The implementation uses a priority-based approach that works seamlessly in both input and output flows without requiring user configuration.

m-misiura · 2025-12-02T16:27:26Z

it seems to me that only input guardrails have been implemented; it would be good to also have this working as output guardrails and work on more than just user_message; perhaps check out how other providers handle this

Fixed - added support for output guardrails and multiple message types. The actions now automatically detect and check the appropriate message from context:
user_message for input guardrails
bot_message for output guardrails

m-misiura · 2025-12-02T16:45:13Z

is there a HTTP session leak in this file? please investigate

Fixed - added cleanup_http_session() function to properly close the shared aiohttp session during application shutdown. The global session uses lazy initialization with asyncio lock for thread-safe creation and connection pooling across all detector clients.

m-misiura

Please go through all the comments; most importantly:

colang flow in the user guide does not appear to be quite correct (there is stochasticity in the outputs received when sending a request with the same input text
I think the current implementation only works on inputs, consider how this could be extended by look at other providers
there are some redundancies and unnecessary code; consider removing any dead code
metadata -- I am not sure if all fields are needed especially things like average score across detectors
pre commit should be run on all files
consider adding some unit tests

m-misiura · 2026-01-07T15:54:04Z

        return None


+class KServeDetectorConfig(BaseModel):


is KServeDetectorConfig used anywhere?

Fixed - KServeDetectorConfig was added in my local branch as part of earlier KServe V1 API work and got merged into this PR since both configurations share the same config.py file.
To keep a clean scope for this PR focused solely on Detections API integration, I've removed KServeDetectorConfig and the kserve_detectors field from this PR. These will be added back in the upcoming KServe refactor PR alongside the full implementation

m-misiura · 2026-01-07T16:27:47Z

I think it might be a good idea to rename detector_clients to something more informative perhaps trustyai_content_detectors or something along those lines?

m-misiura · 2026-01-07T16:34:03Z

+            config: DetectionsAPIConfig with endpoint, detector_id, threshold, etc.
+        """
+        super().__init__(config, detector_name)
+        self.detector_id = getattr(config, "detector_id", "")


why not self.detector_id = config.detector_id instead of self.detector_id = getattr(config, "detector_id", "")?

Fixed - I used used getattr() with defaults as defensive programming to handle multiple config types flexibly. However, you're absolutely right that since we're using Pydantic BaseModel configs with validated fields, this defensive approach is unnecessary—Pydantic already ensures these fields exist and have valid values at config creation time.
I've updated both BaseDetectorClient and DetectionsAPIClient to use direct attribute access:

config.inference_endpoint instead of getattr(config, "inference_endpoint", "")
config.timeout instead of getattr(config, "timeout", 30)
And so on for all config fields

I've kept config: Any in BaseDetectorClient to maintain extensibility (no need to modify base class when adding new detector types), while using the specific config: DetectionsAPIConfig type in DetectionsAPIClient for better type safety on detector-specific fields.

m-misiura · 2026-01-07T16:59:42Z

The Detections API supports batching multiple texts, but this implementation
sends one text per request; is it worthwhile considering supporting this here?

If not, why not?

The Detections API does support batching multiple texts in a single request. However, I chose single-text processing because NeMo's execution model provides one message at a time to detector actions.

Looking at NeMo's architecture and existing detector implementations, actions are invoked once per conversation turn with a single message from context (user_message or bot_message). This pattern is consistent across all built-in detectors—jailbreak_detection, hallucination, content_safety, and others.

NeMo's event-driven architecture creates one UtteranceUserActionFinished event per user utterance, and detector actions execute within that flow receiving a single message. The real-time conversation flow doesn't present a natural batching opportunity where multiple messages would be simultaneously available to process.

Adding batch support would require changing the action signature to handle lists, updating the result aggregation logic, and determining how to source multiple texts within NeMo's per-turn execution model. Given that only one text is available per detector invocation in the current flow, I don't see clear benefits that would justify this additional complexity. What are your thoughts?

m-misiura · 2026-01-07T17:02:31Z

+                    "detection_count": 0,
+                    "total_detections": len(detections),
+                    "individual_scores": [d.get("score", 0.0) for d in detections],
+                    "highest_detection": max(detections, key=lambda d: d.get("score", 0.0), default={}),


is this going to work on Python 3.9 ? and if not, does it matter?

Yes, this code is compatible with Python 3.9. However, NeMo Guardrails dropped Python 3.9 support ahead of its EOL in October 2025. The current supported versions per pyproject.toml are Python 3.10, 3.11, 3.12, and 3.13, so I think Python 3.9 compatibility isn't a concern for this codebase.

m-misiura · 2026-01-07T17:06:50Z

+        """
+        self.config = config
+        self.detector_name = detector_name
+        self.endpoint = getattr(config, "inference_endpoint", "")


self.endpoint = getattr(config, "inference_endpoint", "") self.timeout = getattr(config, "timeout", 30) self.api_key = getattr(config, "api_key", None)

if If using Pydantic configs, should these be direct attribute access?

Fixed - Addressed in the previous comment

m-misiura · 2026-01-07T17:08:49Z

+            ) as response:
+                http_status = response.status
+
+                if http_status == 200:


so all Non-200 responses all raise exceptions and become generic errors?

You're right—the current implementation in base.py raises a generic exception for all non-200 responses, which prevented the specific error handling (404 NOT_FOUND, 422 VALIDATION_ERROR) I added to parse_response() from being reached.
I've updated _call_endpoint() in base.py to return the status code for all HTTP responses instead of raising exceptions. Now:

base.py handles only network/timeout errors (connection failures, timeouts)
parse_response() in subclasses handles HTTP status codes with appropriate differentiation
The specific error labels (NOT_FOUND, VALIDATION_ERROR) are now properly applied based on status code

This allows subclasses to implement status-code-specific error handling while keeping the base class generic.

m-misiura · 2026-01-07T17:15:02Z

+            request_headers.update(headers)
+
+        # Add auth if configured (per-detector key or global env var)
+        token = self.api_key or os.getenv("DETECTIONS_API_KEY")


token = self.api_key or os.getenv("DETECTIONS_API_KEY") what would happen if I mounted a Secret as a volume? would authentication fail?

Yes—the current implementation only checks self.api_key (from config) and the DETECTIONS_API_KEY environment variable. If a secret is mounted as a volume, authentication would fail since the code doesn't read from secret files.
I've updated the authentication logic to support file-based secrets. This now supports the standard OpenShift/Kubernetes pattern where secrets are mounted as files at a path specified by DETECTIONS_API_KEY_FILE, while maintaining backward compatibility with environment variables.

m-misiura · 2026-01-07T17:17:40Z

+        if _http_session is None:
+            async with _session_lock:
+                if _http_session is None:
+                    _http_session = aiohttp.ClientSession()


aiohttp.ClientSession() is currently created with no support for custom certs, is my understanding correct?

Yes, aiohttp.ClientSession() is created with default SSL settings and doesn't support custom CA certificates.
I initially considered leaving SSL configuration to be handled at the infrastructure level (service mesh, system trust store) to keep the detector client focused on the API integration logic. However, after some thought, adding SSL support at the application level provides more deployment flexibility—it works regardless of how the infrastructure is configured and makes local development easier when testing against services with self-signed certificates.

I've added SSL configuration support—the session now accepts custom CA certificates via DETECTIONS_API_CA_CERT environment variable (for Kubernetes secret volumes) while maintaining backward compatibility with default system certificates.

m-misiura · 2026-01-07T17:22:18Z

+    """Aggregated result from multiple detectors"""
+
+    allowed: bool = Field(description="Whether content passed all detectors")
+    reason: str = Field(description="Summary of detection results")


I am not sure if reason: and unavailable_detectors: are strictly needed, what do you think?

Both fields are not strictly necessary but provide significant value for usability and maintainability:

reason field:

Provides pre-formatted human-readable summaries (e.g., "Blocked by 2 detectors: toxicity, jailbreak")

While this could be reconstructed from blocking_detectors list, it would require every caller to duplicate the formatting logic

Currently used for logging and makes future user-facing error messages trivial to implement

Ensures consistent messaging format across all callers

unavailable_detectors field: (more useful than reason field)

Separates infrastructure failures from content violations in a single field check

Without it, callers would need to iterate through blocking_detectors and filter by error labels (e.g., d.label in ["ERROR", "TIMEOUT"])

Used in Colang flows for fail-closed behavior when detectors are unavailable

Makes the distinction between "detector down" vs "content blocked" explicit and immediate

While both could technically be derived from the existing detector result lists, having them as dedicated fields prevents code duplication and makes the common use cases (logging summaries, handling system errors) cleaner and less error-prone.

m-misiura · 2026-01-07T17:43:45Z

+    if not hasattr(config, "rails") or not hasattr(config.rails, "config"):
+        log.warning("Configuration incomplete")
+        return AggregatedDetectorResult(
+            allowed=True,


should this be False?

Yes, Changed to allowed=False to maintain fail-closed behavior on configuration errors. This ensures broken configurations block content rather than allowing it through.

m-misiura · 2026-01-07T17:48:34Z

I think there are inconsistencies in the allowing / blocking logic in this file; please take a look and streamline it.

At present, if I read it correctly, there are at least these options

No config at all -> allowed=False (block)

Config exists but incomplete -> allowed=True (allow)

Detector not in config -> allowed=True (allow)

No text -> allowed=True (allow)

Not sure if this fail-open behaviour is the way to go

Fixed: I've streamlined the logic to consistently fail-closed:

Fixed behavior:
No config at all → allowed=False (unchanged)
Config incomplete → allowed=False (fixed - was True)
Detector not in config → allowed=False (fixed - was True)
No text → Removed early return (now flows through detectors)

Empty messages now go through the full detection pipeline as you suggested earlier. The API naturally returns "No detections found" for empty strings, resulting in allowed=True through the proper process rather than via shortcut.

All configuration/setup errors now consistently fail-closed for safety.

- Implement base interface pattern for extensible detector clients - Add DetectionsAPIClient for v1/text/contents protocol - Support configuration-driven detector management via ConfigMap - Add comprehensive documentation and deployment guide

Code quality improvements: - Standardize all imports to absolute paths for consistency - Remove dead code and simplify helper methods per API spec - Replace tuple pattern with explicit zip() for clarity Enhanced error handling: - Add HTTP status differentiation (404/422/500/503) - Add null-safety checks for incomplete config chains - Preserve constructor validation error handling Output guardrails support: - Add bot_message extraction for output rails - Implement priority-based message type detection - Support both input and output guardrail flows Bug fixes: - Fix inconsistent Colang responses with static messages - Add cleanup_http_session() to prevent session leaks - Standardize metadata structure with 'passed' flag Testing: - Add comprehensive unit tests

- Remove KServeDetectorConfig (moved to separate KServe refactor PR) - Replace getattr() with direct attribute access for Pydantic configs - Add file-based secret support (DETECTIONS_API_KEY_FILE for Kubernetes volumes) - Add custom SSL certificate support (DETECTIONS_API_CA_CERT for OpenShift) - Fix non-200 HTTP responses to allow subclass status differentiation - Standardize fail-closed behavior for all configuration errors - Remove commented empty message handling (now flows through detectors) - Update documentation: SSL config, file-based secrets, authentication priority All tests passing (109 tests, 97% coverage)

…lication - Add SYSTEM_ERROR_LABELS constant to base.py (single source of truth) - Add SERVER_ERROR handling for HTTP 5xx responses in detections_api.py - Add config incomplete checks before accessing rails.config in actions.py - Add detections_api_generate_block_message action for dynamic block messages - Fix env var names in tests (DETECTOR_API_* instead of DETECTIONS_API_*) - Update test assertion for SERVER_ERROR label on HTTP 500 All tests passing (109 tests)

srikartondapu force-pushed the feature/detections-api-integration branch from 3c0323a to 548e85a Compare December 1, 2025 16:53

m-misiura reviewed Dec 2, 2025

View reviewed changes

m-misiura requested changes Dec 2, 2025

View reviewed changes

srikartondapu force-pushed the feature/detections-api-integration branch from 548e85a to 05b9487 Compare December 9, 2025 19:56

srikartondapu requested a review from m-misiura December 9, 2025 19:57

m-misiura reviewed Jan 7, 2026

View reviewed changes

srikartondapu force-pushed the feature/detections-api-integration branch from 05b9487 to f87cb9b Compare January 21, 2026 05:29

srikartondapu requested a review from m-misiura February 3, 2026 07:06

srikartondapu added 4 commits February 17, 2026 11:13

srikartondapu force-pushed the feature/detections-api-integration branch from bda557d to 3d09842 Compare February 17, 2026 16:19


		return response

		def _calculate_highest_score(self, detections: List[Dict[str, Any]]) -> float:

Conversation

srikartondapu commented Nov 26, 2025

Description

Related Issue(s)

Checklist

Uh oh!

m-misiura left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-misiura Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-misiura left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-misiura commented Jan 7, 2026

Uh oh!

m-misiura Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

m-misiura Dec 2, 2025 •

edited

Loading

m-misiura Jan 7, 2026 •

edited

Loading